Embedding the Ulam metric into l1

نویسندگان

  • Moses Charikar
  • Robert Krauthgamer
چکیده

Edit distance is a fundamental measure of distance between strings, the extensive study of which has recently focused on computational problems such as nearest neighbor search, sketching and fast approximation. A very powerful paradigm is to map the metric space induced by the edit distance into a normed space (e. g., `1) with small distortion, and then use the rich algorithmic toolkit known for normed spaces. Although the minimum distortion required to embed edit distance into `1 has received a lot of attention lately, there is a large gap between known upper and lower bounds. We make progress on this question by considering large, well-structured submetrics of the edit distance metric space. Our main technical result is that the Ulam metric, namely, the edit distance on permutations of length at most n, embeds into `1 with distortion O(logn). This immediately leads to sketching algorithms with constant size sketches, and to efficient approximate nearest neighbor search algorithms, with approximation factor O(logn). The embedding and its algorithmic consequences present a big improvement over those previously known for the Ulam metric, and they are significantly better than the state of the art for edit distance in general. Further, we extend these results for the Ulam metric to edit distance on strings that are (locally) non-repetitive, i. e., strings where (close by) substrings are distinct. ACM Classification: F.2.2, G.2.1, G.3 AMS Classification: 68P05, 68W20, 68W25

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Overcoming the l1 non-embeddability barrier: algorithms for product metrics

A common approach for solving computational problems over a difficult metric space is to embed the “hard” metric into L1, which admits efficient algorithms and is thus considered an “easy” metric. This approach has proved successful or partially successful for important spaces such as the edit distance, but it also has inherent limitations: it is provably impossible to go below certain approxim...

متن کامل

Overcoming the `1 Non-Embeddability Barrier: Algorithms for Product Metrics

A common approach for solving computational problems over a difficult metric space is to embed the “hard” metric into L1, which admits efficient algorithms and is thus considered an “easy” metric. This approach has proved successful or partially successful for important spaces such as the edit distance, but it also has inherent limitations: it is provably impossible to go below certain approxim...

متن کامل

The Computational Hardness of Estimating Edit Distance

We prove the first nontrivial communication complexity lower bound for the problem of estimating the edit distance (aka Levenshtein distance) between two strings. To the best of our knowledge, this is the first computational setting in which the complexity of estimating the edit distance is provably larger than that of Hamming distance. Our lower bound exhibits a trade-off between approximation...

متن کامل

Embedding the Ulam metric into ` 1

Edit distance is a fundamental measure of distance between strings, the extensive study of which has recently focused on computational problems such as nearest neighbor search, sketching and fast approximation. A very powerful paradigm is to map the metric space induced by the edit distance into a normed space (e. g., `1) with small distortion, and then use the rich algorithmic toolkit known fo...

متن کامل

D - Width , Metric Embedding , and Their Connections

Embedding between metric spaces is a very powerful algorithmic tool and has been used for finding good approximation algorithms for several problems. In particular, embedding to an l1 norm has been used as the key step in an approximation algorithm for the sparsest cut problem. The sparsest cut problem, in turn, is the main ingredient of many algorithms that have a divide and conquer nature and...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Theory of Computing

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2006